Classes and Data Modeling

What You’ll Learn

Why raw dictionaries-of-strings from CSV files become a problem as your code grows
How to use Python classes to model real-world things like players, courses, rounds, and shots
The difference between regular classes and dataclasses, and when to use each
How to load CSV data into typed, well-structured objects
How to add behavior (methods) to your data models
How inheritance works and when it is (and is not) worth using
How to model nested JSON data from a real iPhone shot-tracking app

Concept

What Is a Data Model?

A data model is the way you represent real-world things in code. Every time you read a CSV file with csv.DictReader, you get back a list of dictionaries where every value is a string:

{'player_id': '1', 'name': 'Bear Woods', 'handicap': '2.1'}

This works for quick scripts, but it has real problems:

No type safety. player['handicap'] is the string '2.1', not a float. You have to remember to convert it every time you use it, and if you forget, Python will happily compare '2.1' < '25.3' using string ordering (which gives the wrong answer).
No discoverability. If you hand someone a dictionary, they have to guess what keys it contains. There is no autocomplete, no documentation, no way to know whether the key is 'handicap' or 'hcp' or 'handicap_index' without reading the CSV header.
No behavior. A dictionary cannot calculate a handicap differential or format a scorecard. You end up writing standalone functions that take dictionaries as arguments, and the connection between the data and the operations on that data is purely in your head.
No validation. Nothing prevents you from creating a player dict with a negative handicap or a missing name. Bugs show up far from where they were introduced.

A proper data model solves all of these problems by giving your data structure, types, and behavior.

Mapping Real-World Things to Code

Think about what we are modeling in our golf dataset:

A Player has an ID, a name, and a handicap.
A Course has an ID, a name, a city, a state, a slope rating, and a course rating.
A Hole belongs to a course and has a hole number, par, yardage, and handicap index.
A Round records that a specific player played a specific course on a specific date, with a total score and weather conditions.
A Shot records one swing within a round: the hole, shot number, club used, where the ball started, where it ended, and the strokes gained value.

Each of these is a noun in the golf domain, and each has specific attributes (the data it carries) and behavior (things you can compute from it). In Python, we model nouns as classes and behavior as methods.

Classes vs Dataclasses

Python gives you two main ways to define a class:

Regular classes require you to write __init__, __repr__, __eq__, and other boilerplate methods by hand. This gives you full control but involves a lot of repetitive code.

Dataclasses (introduced in Python 3.7) automatically generate __init__, __repr__, __eq__, and more based on the fields you declare. You just list the attributes and their types, and Python does the rest.

Feature	Regular Class	Dataclass
`__init__` generated?	No, you write it	Yes, automatic
`__repr__` generated?	No, you write it	Yes, automatic
`__eq__` generated?	No, you write it	Yes, automatic
Type annotations?	Optional	Required (used to define fields)
Best for	Complex behavior, custom initialization	Data containers, records, domain objects

Rule of thumb for data science: Start with dataclasses. They cover 90% of what you need. Switch to a regular class only if you need custom __init__ logic or complex internal state management.

The Domain Model

When you define a set of classes that represent the key concepts in your problem area, you have created a domain model. For our golf data, the domain model is the collection of Player, Course, Hole, Round, and Shot classes, along with the relationships between them (a Round references a Player and a Course; a Shot belongs to a Round).

A good domain model makes your code read like a description of the problem. Instead of:

score_diff = int(row['total_score']) - float(course_dict['course_rating'])

you can write:

score_diff = round.total_score - course.course_rating

Same logic, but the second version is self-documenting.

Code

1. A Basic Class: Player

Let’s start by building a Player class the traditional way, with a manual __init__ method. We will also add __repr__ (for developer-friendly output) and __str__ (for user-friendly output).

class Player:
    """A golfer with an ID, name, and handicap."""

    def __init__(self, player_id, name, handicap):
        self.player_id = player_id
        self.name = name
        self.handicap = handicap

    def __repr__(self):
        """Developer-friendly representation (shows up in the REPL and debugger)."""
        return f'Player(player_id={self.player_id}, name={self.name!r}, handicap={self.handicap})'

    def __str__(self):
        """User-friendly representation (shows up when you print)."""
        return f'{self.name} (handicap {self.handicap})'


# Create an instance
bear = Player(player_id=1, name='Bear Woods', handicap=2.1)

print('repr:', repr(bear))
print('str: ', str(bear))
print()
print('Accessing attributes:')
print(f'  Name:     {bear.name}')
print(f'  Handicap: {bear.handicap}')
print(f'  Type of handicap: {type(bear.handicap)}')

Compare this to a raw dictionary from csv.DictReader:

# This is what csv.DictReader gives us
bear_dict = {'player_id': '1', 'name': 'Bear Woods', 'handicap': '2.1'}

print('Dictionary handicap:', bear_dict['handicap'], type(bear_dict['handicap']))
print('Class handicap:     ', bear.handicap, type(bear.handicap))
print()

# The string comparison trap
print('String comparison: "2.1" < "25.3" =>', '2.1' < '25.3')
print('Float comparison:  2.1 < 25.3    =>', 2.1 < 25.3)

2. Instance Methods: Adding Behavior

Classes become powerful when you attach methods – functions that operate on the instance’s data. Let’s add two methods to Player:

format_handicap() returns a nicely formatted string like "+2.1" or "+25.3"
handicap_differential(score, course_rating, slope_rating) computes the USGA handicap differential formula:

\[\text{differential} = \frac{(\text{score} - \text{course rating}) \times 113}{\text{slope rating}}\]

The number 113 is the standard slope rating (the slope of a course of average difficulty).

class Player:
    """A golfer with an ID, name, and handicap."""

    def __init__(self, player_id, name, handicap):
        self.player_id = player_id
        self.name = name
        self.handicap = handicap

    def __repr__(self):
        return f'Player(player_id={self.player_id}, name={self.name!r}, handicap={self.handicap})'

    def __str__(self):
        return f'{self.name} ({self.format_handicap()})'

    def format_handicap(self):
        """Return the handicap as a formatted string like '+2.1'."""
        return f'+{self.handicap:.1f}'

    def handicap_differential(self, score, course_rating, slope_rating):
        """Calculate the USGA handicap differential for a single round.

        Formula: (score - course_rating) * 113 / slope_rating
        """
        return round((score - course_rating) * 113 / slope_rating, 1)


bear = Player(1, 'Bear Woods', 2.1)
print(bear)
print()

# Bear shot 85 at North Park (course rating 71.1, slope 117)
diff = bear.handicap_differential(score=85, course_rating=71.1, slope_rating=117)
print(f'Bear shot 85 at North Park:')
print(f'  Handicap differential: {diff}')

3. Dataclasses: Eliminating Boilerplate

Look at how much code we wrote just for the __init__ and __repr__ methods above. For a data container like Player, this is pure boilerplate. The dataclasses module generates it for us.

The @dataclass decorator reads the class’s type-annotated fields and automatically generates __init__, __repr__, and __eq__.

from dataclasses import dataclass


@dataclass
class Player:
    """A golfer with an ID, name, and handicap."""
    player_id: int
    name: str
    handicap: float

    def format_handicap(self):
        """Return the handicap as a formatted string like '+2.1'."""
        return f'+{self.handicap:.1f}'

    def handicap_differential(self, score, course_rating, slope_rating):
        """Calculate the USGA handicap differential for a single round."""
        return round((score - course_rating) * 113 / slope_rating, 1)


bear = Player(player_id=1, name='Bear Woods', handicap=2.1)

# __repr__ is generated automatically
print(repr(bear))

# __eq__ is generated automatically -- compares all fields
bear2 = Player(player_id=1, name='Bear Woods', handicap=2.1)
print(f'bear == bear2: {bear == bear2}')

# Our custom methods still work
print(f'Formatted handicap: {bear.format_handicap()}')

Dataclass features: defaults and frozen

Dataclass fields can have default values. Fields with defaults must come after fields without defaults (just like function arguments).

The frozen=True option makes instances immutable – you cannot change their attributes after creation. This is useful for data that should not be accidentally modified.

@dataclass(frozen=True)
class CourseInfo:
    """Immutable course data -- cannot be modified after creation."""
    name: str
    slope_rating: int
    course_rating: float
    city: str = 'Pittsburgh'
    state: str = 'PA'


north_park = CourseInfo(
    name='North Park Golf Course',
    slope_rating=117,
    course_rating=71.1
)
print(north_park)
print(f'City defaults to: {north_park.city}')
print()

# Try to modify a frozen dataclass
try:
    north_park.slope_rating = 999
except AttributeError as e:
    print(f'Cannot modify frozen dataclass: {e}')

4. Building the Golf Domain Model

Now let’s define dataclasses for every entity in our golf dataset. Each class mirrors one CSV file. We will keep them simple – just data containers with proper types.

from dataclasses import dataclass


@dataclass
class Player:
    """A golfer."""
    player_id: int
    name: str
    handicap: float

    def format_handicap(self):
        return f'+{self.handicap:.1f}'

    def handicap_differential(self, score, course_rating, slope_rating):
        """USGA handicap differential: (score - CR) * 113 / slope."""
        return round((score - course_rating) * 113 / slope_rating, 1)


@dataclass
class Course:
    """A golf course."""
    course_id: int
    name: str
    city: str
    state: str
    slope_rating: int
    course_rating: float


@dataclass
class Hole:
    """A single hole on a course."""
    course_id: int
    hole_number: int
    par: int
    yardage: int
    handicap_index: int


@dataclass
class Round:
    """A recorded round of golf."""
    round_id: int
    player_id: int
    course_id: int
    date: str
    total_score: int
    weather: str


@dataclass
class Shot:
    """A single shot within a round."""
    round_id: int
    hole: int
    shot_number: int
    club: str
    start_lie: str
    start_distance_to_pin: float
    end_lie: str
    end_distance_to_pin: float
    strokes_gained: float


print('Domain model defined: Player, Course, Hole, Round, Shot')

Notice how each field has a clear type. When we load data from CSV, we will convert strings to the proper types at load time, and from that point on everything is typed correctly.

5. Loading CSV Data into Dataclasses

The bridge between raw CSV data and our domain model is a loading function (or a @classmethod factory). The pattern is:

Read the CSV with csv.DictReader (which gives dictionaries of strings).
For each row, convert strings to proper types and create a dataclass instance.
Return a list of typed objects.

We will use @classmethod factory methods so each class knows how to construct itself from a CSV row.

import csv
from dataclasses import dataclass
from pathlib import Path


DATA_DIR = Path('../../data')


@dataclass
class Player:
    player_id: int
    name: str
    handicap: float

    @classmethod
    def from_csv_row(cls, row):
        """Create a Player from a csv.DictReader row."""
        return cls(
            player_id=int(row['player_id']),
            name=row['name'],
            handicap=float(row['handicap']),
        )

    def format_handicap(self):
        return f'+{self.handicap:.1f}'

    def handicap_differential(self, score, course_rating, slope_rating):
        return round((score - course_rating) * 113 / slope_rating, 1)


@dataclass
class Course:
    course_id: int
    name: str
    city: str
    state: str
    slope_rating: int
    course_rating: float

    @classmethod
    def from_csv_row(cls, row):
        return cls(
            course_id=int(row['course_id']),
            name=row['name'],
            city=row['city'],
            state=row['state'],
            slope_rating=int(row['slope_rating']),
            course_rating=float(row['course_rating']),
        )


@dataclass
class Hole:
    course_id: int
    hole_number: int
    par: int
    yardage: int
    handicap_index: int

    @classmethod
    def from_csv_row(cls, row):
        return cls(
            course_id=int(row['course_id']),
            hole_number=int(row['hole_number']),
            par=int(row['par']),
            yardage=int(row['yardage']),
            handicap_index=int(row['handicap_index']),
        )


@dataclass
class Round:
    round_id: int
    player_id: int
    course_id: int
    date: str
    total_score: int
    weather: str

    @classmethod
    def from_csv_row(cls, row):
        return cls(
            round_id=int(row['round_id']),
            player_id=int(row['player_id']),
            course_id=int(row['course_id']),
            date=row['date'],
            total_score=int(row['total_score']),
            weather=row['weather'],
        )


@dataclass
class Shot:
    round_id: int
    hole: int
    shot_number: int
    club: str
    start_lie: str
    start_distance_to_pin: float
    end_lie: str
    end_distance_to_pin: float
    strokes_gained: float

    @classmethod
    def from_csv_row(cls, row):
        return cls(
            round_id=int(row['round_id']),
            hole=int(row['hole']),
            shot_number=int(row['shot_number']),
            club=row['club'],
            start_lie=row['start_lie'],
            start_distance_to_pin=float(row['start_distance_to_pin']),
            end_lie=row['end_lie'],
            end_distance_to_pin=float(row['end_distance_to_pin']),
            strokes_gained=float(row['strokes_gained']),
        )


print('All dataclasses defined with from_csv_row() factory methods.')

Now let’s write a generic loader function and use it to load all our data.

def load_csv(filepath, cls):
    """Read a CSV file and return a list of dataclass instances.

    Args:
        filepath: Path to the CSV file.
        cls: A dataclass with a from_csv_row() classmethod.

    Returns:
        A list of instances of cls.
    """
    with open(filepath, 'r') as f:
        reader = csv.DictReader(f)
        return [cls.from_csv_row(row) for row in reader]


# Load all the data
players = load_csv(DATA_DIR / 'players.csv', Player)
courses = load_csv(DATA_DIR / 'courses.csv', Course)
holes = load_csv(DATA_DIR / 'holes.csv', Hole)
rounds = load_csv(DATA_DIR / 'rounds.csv', Round)
shots = load_csv(DATA_DIR / 'shots.csv', Shot)

print(f'Players: {len(players)}')
print(f'Courses: {len(courses)}')
print(f'Holes:   {len(holes)}')
print(f'Rounds:  {len(rounds)}')
print(f'Shots:   {len(shots)}')

Now let’s see the benefit. Every attribute is the correct type, and the objects print clearly.

# Inspect the loaded objects
for p in players:
    print(p)

print()

# Types are correct
bear = players[0]
print(f'{bear.name} handicap: {bear.handicap} (type: {type(bear.handicap).__name__})')
print()

# We can sort players by handicap -- no string conversion needed
sorted_players = sorted(players, key=lambda p: p.handicap)
print('Players sorted by handicap (best to worst):')
for p in sorted_players:
    print(f'  {p.name:20s} {p.format_handicap()}')

# Inspect courses
for c in courses:
    print(f'{c.name:30s} ({c.city}, {c.state})  Slope: {c.slope_rating}  CR: {c.course_rating}')

# Build lookup dictionaries for quick access by ID
player_lookup = {p.player_id: p for p in players}
course_lookup = {c.course_id: c for c in courses}

# Now we can resolve a round to actual objects
r = rounds[0]
player = player_lookup[r.player_id]
course = course_lookup[r.course_id]

print(f'Round {r.round_id}: {player.name} played {course.name} on {r.date}')
print(f'  Score: {r.total_score}  |  Weather: {r.weather}')
print(f'  Relative to course rating: {r.total_score - course.course_rating:+.1f}')

6. Adding Behavior: Methods on Domain Objects

Now that we have typed objects and lookup dictionaries, we can add methods that compute useful golf statistics. Let’s add behavior to our Course class and create a RoundAnalyzer helper class.

Course total par from its holes

A course’s total par is the sum of the par values for all 18 (or 9) holes. Let’s compute this from the holes data.

# Group holes by course
holes_by_course = {}
for h in holes:
    if h.course_id not in holes_by_course:
        holes_by_course[h.course_id] = []
    holes_by_course[h.course_id].append(h)


def course_total_par(course_id):
    """Calculate the total par for a course from its holes."""
    return sum(h.par for h in holes_by_course[course_id])


def course_total_yardage(course_id):
    """Calculate the total yardage for a course from its holes."""
    return sum(h.yardage for h in holes_by_course[course_id])


for c in courses:
    par = course_total_par(c.course_id)
    yards = course_total_yardage(c.course_id)
    print(f'{c.name:30s}  Par {par}  |  {yards:,} yards  |  Slope {c.slope_rating}  |  CR {c.course_rating}')

RoundAnalyzer: computing scoring breakdowns

Let’s build a class that takes a round and its associated shot data and computes useful statistics. This is a good example of a class that is not a pure data container – it exists to provide behavior.

class RoundAnalyzer:
    """Analyzes a single round of golf."""

    SCORING_NAMES = {
        -3: 'albatross', -2: 'eagle', -1: 'birdie',
        0: 'par', 1: 'bogey', 2: 'double bogey', 3: 'triple bogey',
    }

    def __init__(self, round_obj, round_shots, course, holes_for_course):
        self.round = round_obj
        self.shots = round_shots
        self.course = course
        self.holes = {h.hole_number: h for h in holes_for_course}

    def strokes_per_hole(self):
        """Return a dict mapping hole_number -> number of strokes."""
        counts = {}
        for s in self.shots:
            counts[s.hole] = counts.get(s.hole, 0) + 1
        return counts

    def relative_to_par(self):
        """Return total score relative to course par."""
        total_par = sum(h.par for h in self.holes.values())
        return self.round.total_score - total_par

    def scoring_breakdown(self):
        """Return a dict counting birdies, pars, bogeys, etc."""
        breakdown = {}
        for hole_num, strokes in self.strokes_per_hole().items():
            par = self.holes[hole_num].par
            diff = strokes - par
            label = self.SCORING_NAMES.get(diff, f'+{diff}' if diff > 0 else str(diff))
            breakdown[label] = breakdown.get(label, 0) + 1
        return breakdown

    def total_strokes_gained(self):
        """Return the total strokes gained for the round."""
        return round(sum(s.strokes_gained for s in self.shots), 2)

    def print_scorecard(self, player):
        """Print a formatted scorecard."""
        rtp = self.relative_to_par()
        rtp_str = f'+{rtp}' if rtp > 0 else ('E' if rtp == 0 else str(rtp))

        print(f'=== {player.name} at {self.course.name} ({self.round.date}) ===')
        print(f'Score: {self.round.total_score} ({rtp_str})  |  Weather: {self.round.weather}')
        print(f'Total Strokes Gained: {self.total_strokes_gained()}')
        print()

        breakdown = self.scoring_breakdown()
        print('Scoring breakdown:')
        order = ['eagle', 'birdie', 'par', 'bogey', 'double bogey', 'triple bogey']
        for label in order:
            if label in breakdown:
                print(f'  {label:>15s}: {breakdown[label]}')
        # Print any labels not in the standard order
        for label, count in breakdown.items():
            if label not in order:
                print(f'  {label:>15s}: {count}')


print('RoundAnalyzer class defined.')

# Group shots by round_id
shots_by_round = {}
for s in shots:
    if s.round_id not in shots_by_round:
        shots_by_round[s.round_id] = []
    shots_by_round[s.round_id].append(s)


# Analyze Bear Woods' first round (round_id=1, North Park)
r = rounds[0]  # round_id=1
player = player_lookup[r.player_id]
course = course_lookup[r.course_id]

analyzer = RoundAnalyzer(
    round_obj=r,
    round_shots=shots_by_round[r.round_id],
    course=course,
    holes_for_course=holes_by_course[r.course_id],
)

analyzer.print_scorecard(player)

# Print a summary for every round
print(f'{"Player":20s} {"Course":30s} {"Date":12s} {"Score":>5s} {"vs Par":>7s} {"SG":>7s}')
print('-' * 85)

for r in rounds:
    player = player_lookup[r.player_id]
    course = course_lookup[r.course_id]
    analyzer = RoundAnalyzer(r, shots_by_round[r.round_id], course, holes_by_course[r.course_id])
    rtp = analyzer.relative_to_par()
    rtp_str = f'+{rtp}' if rtp > 0 else ('E' if rtp == 0 else str(rtp))
    sg = analyzer.total_strokes_gained()

    print(f'{player.name:20s} {course.name:30s} {r.date:12s} {r.total_score:>5d} {rtp_str:>7s} {sg:>+7.2f}')

Handicap differentials per player

Now we can use Player.handicap_differential() with real data to show each player’s differentials across their rounds.

for p in players:
    player_rounds = [r for r in rounds if r.player_id == p.player_id]
    diffs = []
    for r in player_rounds:
        c = course_lookup[r.course_id]
        diff = p.handicap_differential(r.total_score, c.course_rating, c.slope_rating)
        diffs.append(diff)

    avg_diff = sum(diffs) / len(diffs)
    print(f'{p.name:20s} (handicap {p.format_handicap()})')
    print(f'  Differentials: {diffs}')
    print(f'  Average: {avg_diff:.1f}')
    print()

7. Inheritance (Brief)

Inheritance lets one class extend another. The child class gets all the parent’s attributes and methods, and can add or override them.

In data science work, inheritance is less common than in traditional software engineering. Dataclasses and composition (building objects that contain other objects) usually cover what you need. But it is worth knowing the concept.

@dataclass
class GolfRecord:
    """Base class for any record that has an ID."""
    record_id: int

    def describe(self):
        return f'{self.__class__.__name__} #{self.record_id}'


@dataclass
class TournamentRound(GolfRecord):
    """A round played in a tournament -- extends GolfRecord."""
    player_name: str
    course_name: str
    score: int
    tournament_name: str = 'Pittsburgh Open'


tr = TournamentRound(
    record_id=1,
    player_name='Bear Woods',
    course_name='North Park Golf Course',
    score=85,
)

print(tr)
print(tr.describe())  # inherited from GolfRecord
print(f'Is TournamentRound a GolfRecord? {isinstance(tr, GolfRecord)}')

Inheritance creates an “is-a” relationship: a TournamentRound is a GolfRecord. This can be useful for shared behavior, but it also creates tight coupling between classes.

For data science work, prefer: - Dataclasses for data containers (Player, Course, Round, etc.) - Composition for complex behavior (RoundAnalyzer has a Round, not is a Round) - Inheritance only when you have a genuine “is-a” relationship with shared behavior

You will rarely need deep inheritance hierarchies. If you find yourself building class trees more than two levels deep, step back and consider whether a simpler approach would work.

8. Connecting to the iPhone App Model

Our data directory also contains shot-tag-round.json, which is a real export from an iPhone shot-tracking app. The JSON structure is nested: a round contains shots, each shot has a club object and GPS coordinates.

Without a data model, you end up writing code like:

data['shots'][0]['club']['name']  # what even is this?

Let’s define dataclasses that mirror the JSON structure and load the data into typed objects.

import json


# First, let's see what raw JSON access looks like
with open(DATA_DIR / 'shot-tag-round.json', 'r') as f:
    raw_data = json.load(f)

print('Top-level keys:', list(raw_data.keys()))
print(f'Course: {raw_data["courseName"]}')
print(f'Number of shots: {len(raw_data["shots"])}')
print()

# Accessing nested data with raw dicts is fragile and hard to read
first_shot = raw_data['shots'][0]
print('First shot (raw dict):')
print(f'  Club: {first_shot["club"]["name"]}')
print(f'  Lat:  {first_shot["coordinate"]["latitude"]}')
print(f'  Lon:  {first_shot["coordinate"]["longitude"]}')

@dataclass
class Coordinate:
    """A GPS coordinate."""
    latitude: float
    longitude: float

    @classmethod
    def from_dict(cls, data):
        return cls(latitude=data['latitude'], longitude=data['longitude'])


@dataclass
class ClubInfo:
    """Club details from the shot-tracking app."""
    code: str
    family: str
    club_id: int
    name: str

    @classmethod
    def from_dict(cls, data):
        return cls(
            code=data['code'],
            family=data['family'],
            club_id=data['id'],
            name=data['name'],
        )


@dataclass
class ShotTagShot:
    """A single shot from the iPhone app."""
    shot_id: str
    club: ClubInfo
    coordinate: Coordinate
    course_name: str
    horizontal_accuracy: float
    timestamp: float

    @classmethod
    def from_dict(cls, data):
        return cls(
            shot_id=data['id'],
            club=ClubInfo.from_dict(data['club']),
            coordinate=Coordinate.from_dict(data['coordinate']),
            course_name=data['courseName'],
            horizontal_accuracy=data['horizontalAccuracy'],
            timestamp=data['timestamp'],
        )


@dataclass
class ShotTagRound:
    """A round exported from the iPhone shot-tracking app."""
    round_id: str
    course_name: str
    start_date: float
    end_date: float
    hole_boundaries: list
    hole_pin_locations: list  # list of Coordinate
    shots: list  # list of ShotTagShot

    @classmethod
    def from_dict(cls, data):
        return cls(
            round_id=data['id'],
            course_name=data['courseName'],
            start_date=data['startDate'],
            end_date=data['endDate'],
            hole_boundaries=data['holeBoundaries'],
            hole_pin_locations=[Coordinate.from_dict(loc) for loc in data['holePinLocations']],
            shots=[ShotTagShot.from_dict(s) for s in data['shots']],
        )


print('ShotTagRound, ShotTagShot, ClubInfo, and Coordinate dataclasses defined.')

# Load the JSON into our typed model
with open(DATA_DIR / 'shot-tag-round.json', 'r') as f:
    raw_data = json.load(f)

app_round = ShotTagRound.from_dict(raw_data)

print(f'Course: {app_round.course_name}')
print(f'Holes tracked: {len(app_round.hole_pin_locations)}')
print(f'Total shots: {len(app_round.shots)}')
print(f'Hole boundaries: {app_round.hole_boundaries}')
print()

# Now accessing nested data is clean and readable
first_shot = app_round.shots[0]
print(f'First shot:')
print(f'  Club: {first_shot.club.name} ({first_shot.club.code})')
print(f'  Family: {first_shot.club.family}')
print(f'  Location: ({first_shot.coordinate.latitude:.6f}, {first_shot.coordinate.longitude:.6f})')
print(f'  Accuracy: {first_shot.horizontal_accuracy:.1f} meters')

# Use the hole boundaries to group shots by hole
boundaries = app_round.hole_boundaries

print(f'Pin locations and shots per hole:')
print(f'{"Hole":>4s}  {"Shots":>5s}  {"Pin Lat":>12s}  {"Pin Lon":>12s}  Clubs Used')
print('-' * 75)

for i in range(len(boundaries) - 1):
    hole_num = i + 1
    start_idx = boundaries[i]
    end_idx = boundaries[i + 1]
    hole_shots = app_round.shots[start_idx:end_idx]
    pin = app_round.hole_pin_locations[i]
    clubs = [s.club.code for s in hole_shots]

    print(f'{hole_num:>4d}  {len(hole_shots):>5d}  {pin.latitude:>12.6f}  {pin.longitude:>12.6f}  {" -> ".join(clubs)}')

# Handle the last hole (from last boundary to end of shots)
last_hole = len(boundaries)
last_shots = app_round.shots[boundaries[-1]:]
last_pin = app_round.hole_pin_locations[-1]
last_clubs = [s.club.code for s in last_shots]
print(f'{last_hole:>4d}  {len(last_shots):>5d}  {last_pin.latitude:>12.6f}  {last_pin.longitude:>12.6f}  {" -> ".join(last_clubs)}')

Compare the typed access above to what the raw dict version would look like:

# Raw dict -- hard to read, no autocomplete, no type safety
raw_data['shots'][0]['club']['name']
raw_data['holePinLocations'][0]['latitude']

# Typed dataclass -- clear, discoverable, self-documenting
app_round.shots[0].club.name
app_round.hole_pin_locations[0].latitude

The dataclass version is not just cleaner – it also catches typos at definition time rather than at runtime.

AI

Exercise 1: Ask AI to Design a Golf Domain Model

Give an AI assistant the following description and ask it to design a data model:

Prompt to use:

I have a golf tracking application. Players play rounds at courses. Each course has 18 holes with different pars and yardages. Each round records the date, weather, and total score. Within each round, every shot is tracked with the club used, where the ball started (lie and distance to pin), where it ended, and a strokes gained value. Design a Python data model using dataclasses to represent this domain.

Evaluate the AI’s response:

Did it identify the right entities? (Player, Course, Hole, Round, Shot)
Did it use appropriate types? (int for IDs, float for distances, str for names)
Did it over-engineer? Common over-engineering: adding an Enum for weather or lie types, creating abstract base classes, adding methods you did not ask for, or building an ORM-style relationship system.
Did it miss anything? For example, did it include slope rating and course rating on the Course, or just par?
How does it compare to the model we built above?

# Paste the AI-generated data model here and compare it to ours.
# Note what the AI got right, what it missed, and what it over-engineered.

Exercise 2: Ask AI to Calculate Handicap Index

Give the AI our Player dataclass and ask it to add a method that calculates a handicap index from a list of rounds.

Prompt to use:

Here is my Player dataclass:
@dataclass
class Player:
    player_id: int
    name: str
    handicap: float

    def handicap_differential(self, score, course_rating, slope_rating):
        return round((score - course_rating) * 113 / slope_rating, 1)
Add a method calculate_handicap_index that takes a list of (score, course_rating, slope_rating) tuples representing recent rounds and returns the USGA handicap index. The formula is: take the best 8 differentials out of the most recent 20, average them, and multiply by 0.96.

Evaluate the AI’s response:

Does it correctly take the best (lowest) 8 differentials?
Does it handle the case where the player has fewer than 20 rounds? (The USGA has a table for this, but a reasonable simplification is to take the best N/2 differentials if fewer than 20 rounds are available.)
Does it multiply by 0.96?
Does it reuse the existing handicap_differential method, or rewrite the formula?

# Paste the AI-generated method here.
# Test it with real data from our rounds to see if the output makes sense.

Exercise 3: Ask AI to Convert Dataclasses to/from JSON

Prompt to use:

I have these Python dataclasses for golf data:
@dataclass
class Player:
    player_id: int
    name: str
    handicap: float

@dataclass
class Round:
    round_id: int
    player_id: int
    course_id: int
    date: str
    total_score: int
    weather: str
Write functions to serialize a list of these objects to JSON and deserialize them back. Handle the Round’s date field as a proper date object (not just a string).

Evaluate the AI’s response:

Does it handle serialization? (json.dumps cannot serialize dataclass instances by default – the AI needs to provide a solution like dataclasses.asdict() or a custom encoder.)
Does it handle deserialization? (JSON gives back plain dicts – the AI needs to reconstruct dataclass instances.)
Does it handle the date field? (Converting between str and datetime.date requires datetime.strptime or date.fromisoformat().)
Does it handle nested objects? (If you include a Player reference inside Round, can the AI serialize/deserialize that relationship?)
Is the solution overly complex? (Using dataclasses.asdict() is much simpler than writing a custom JSON encoder from scratch.)

# Paste the AI-generated serialization code here.
# Test round-tripping: serialize to JSON, deserialize back,
# and verify the result matches the original.

Summary

When to Use Dicts vs Classes vs Dataclasses

Approach	Best For	Tradeoffs
Plain dict	Quick scripts, throwaway analysis, exploring data for the first time	No type safety, no discoverability, no validation, no behavior
Regular class	Complex initialization logic, internal state management, custom `__init__` behavior	More boilerplate, full control
Dataclass	Data containers, domain models, records, anything where the primary purpose is carrying data with proper types	Minimal boilerplate, auto-generated `__init__`/`__repr__`/`__eq__`, can still add methods
Frozen dataclass	Immutable records, lookup data, configuration that should not change	Same as dataclass but prevents accidental mutation

The progression in this course: - Topics 01-04: Plain dicts from csv.DictReader (fine for learning loops, comprehensions, etc.) - Topic 05 (this one): Dataclasses to give structure and types to our data - Topics 06+: Pandas DataFrames for tabular analysis (a different kind of structure, optimized for columnar operations)

Key Syntax Reference

Regular class:

class Player:
    def __init__(self, player_id, name, handicap):
        self.player_id = player_id
        self.name = name
        self.handicap = handicap

    def __repr__(self):
        return f'Player({self.player_id}, {self.name!r}, {self.handicap})'

Dataclass:

from dataclasses import dataclass

@dataclass
class Player:
    player_id: int
    name: str
    handicap: float

Frozen dataclass (immutable):

@dataclass(frozen=True)
class CourseInfo:
    name: str
    slope_rating: int
    course_rating: float

Factory classmethod:

@classmethod
def from_csv_row(cls, row):
    return cls(
        player_id=int(row['player_id']),
        name=row['name'],
        handicap=float(row['handicap']),
    )

Generic CSV loader:

def load_csv(filepath, cls):
    with open(filepath, 'r') as f:
        return [cls.from_csv_row(row) for row in csv.DictReader(f)]

Next up: Topic 06 – Pandas and Exploratory Data Analysis.

What You’ll Learn

Concept

What Is a Data Model?

Mapping Real-World Things to Code

Classes vs Dataclasses

The Domain Model

Code

1. A Basic Class: Player

2. Instance Methods: Adding Behavior

3. Dataclasses: Eliminating Boilerplate

Dataclass features: defaults and frozen

4. Building the Golf Domain Model

5. Loading CSV Data into Dataclasses

6. Adding Behavior: Methods on Domain Objects

Course total par from its holes

RoundAnalyzer: computing scoring breakdowns

Handicap differentials per player

7. Inheritance (Brief)

8. Connecting to the iPhone App Model

AI

Exercise 1: Ask AI to Design a Golf Domain Model

Exercise 2: Ask AI to Calculate Handicap Index

Exercise 3: Ask AI to Convert Dataclasses to/from JSON

Summary

When to Use Dicts vs Classes vs Dataclasses

Key Syntax Reference

Get the Complete Course Bundle